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Summary. We begin with an interpretation of the Li-distance between two power 
spectral densities and then, following an analogous rationale, we develop a natural 
metric for quantifying distance between respective covariance matrices. 

1 Introduction 

Consider two discrete-time, stationary, zero-mean, (real-valued for notational 
convenience) random processes y fc and y^ (k £ Z) having power spectral 
densities f y {9) and f y (9) (9 £ [— 7r, 7r]), and autocorrelation functions Re and 
Re (< G Z), respectively, i.e., 



and similarly for the "hatted" quantities. When the power spectrum contains 
a singular part, then f(9)d9 needs to be replaced by a non-negative finite 
spectral measure d(J,(0). 

We are interested in quantifying the distance between respective spectra 
and statistics for two such random process yfc and y^. When two vectors 



of autocorrelation samples are available and need to be compared, one may 
use any metric in M" for that purpose, as for instance ||R„ — Rn[[a — 



be attached to such a distance other than the fact that it is a metric in K™. 
Our goal in this paper, is to seek a metric which can be physically motivated. 




R„ := [R Ri ... Rn-i] , and 
R„ := [Rq Ri . . . R n -i\ 




However, we are not aware of any significance that can 
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Similarly, if we are to compare f y (9) and f y (8), it appears difficult to moti- 
vate the use of an L 2 -distance \\f y (0) — /y(#)||2- For one thing, the L 2 -distance 
cannot be generalized to deal with spectral measures when singular parts are 
present. There are certainly other alternatives. In the speech processing lit- 
erature in particular there is a plethora of distances that, however, are not 
metrics [B] but have been motivated by specific needs. Function theoretic al- 
ternatives that one can use (e.g., L p -norms, etc.) including Wasserstein-likc 
transportation measures typically lack a physical interpretation. In a recent 
study [H [S] a pseudometric was constructed as a geodesic between spectral 
densities/measures with respect to a rather natural Riemannian metric — this 
metric quantifies the degradation of predictive-error variance when the pre- 
dictor is designed based on the wrong choice between two alternatives and 
the geometry is, in essence, Euclidean but only after we transform spectral 
densities using the logarithmic map. 

In the current paper we focus on the L\ distance 

\\ly{0) - /y(0)||a := ^ f \f y (9) - fy(9)\ d9 

which has also a rather natural interpretation. After a brief discussion of the 
relevance of the Li-distance, following a similar rationale, we will develop an 
analogous metric between finite partial covariance data of the corresponding 
random processes. 

2 Interpretation of the L\ distance 

Given and we postulate that ther exist two random processes V'fc and 
so that 

yfe + ipk = fk + 4>k- (1) 

Alternatively, we postulate that there exists a random process and that 
the two original random processes relate to via 

yfe = z/c - i>k 

y k = z fe - $ k . 

It is natural to seek such perturbations of minimal total combined variance 

E^D + E^l} (2) 

that is sufficient to "reconcile" the two processes. The combined variance 
^'{V'fc} + -^{V'fc} represents the minimal amount of "energy" of perturbations 
in the two time-series that is needed to render the two indistinguishable. Intu- 
itively, the minimal combined variance which is consistent with the available 
data quantifies the distance between the two. 



Distances between time-series & statistics 



3 



Given f y , f y , the optimal choice consists of random processes ipk and ipk 
such that ipk and ipk are independent, yk and ipk are also independent, and 

M^-| o otherwise, 

f ./m _ J /y(») - /*(*) if - /y(») < 0, ,o W 

- \ otherwise. ^ > 

Then, the power spectrum of the "sum" 

Zfc := Yfc + ipk = Yk + iJJk 



is simply 
and 



f z (9) := max{/ y (0), f y (d)}, 9 € [-tt.tt], 

d(/ y ^ y ) :=£{^} + £{V|} (4) 
1 



= ll/y-/y||l- (5) 

Obviously, this construction extends in the obvious way to the case of not- 
necessarily absolutely continuous power spectra as well, and the metric in- 
cludes the measure of any discrepancies between the singular parts of the two 
spectral measures^ Clearly, d(f y ,f y ) is a metric as seen from ([5]). Building 
on a similar rationale, in the next section, we develop a metric for covariance 
matrices. 



3 A distance between covariance matrices 

It is often the case that only a finite segment of the autocorrelation func- 
tion of time-series and yfc is available (and even then, possibly uncertain) . 
Thus, it is of interest to consider distances between the partial autocorrela- 
tion statistics R and R. To this end, we follow the dictum of the previous 
section and define as a distance measure the minimal combined variance of 

1 It will be interesting to explore the practical significance of other possibilities for 
quantifying distance such as 

HAM ) + U(. )) M r UV) + U{0) de 
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random processes ipk and V'fc for which ([1} holds. Naturally, since only partial 
covariance samples are available, the random processes ipk,ipk and ([I]) need 
to be consistent with these data. 
First denote by 



R,, 



i?0 

R-i 



Ri 
Ro 



R-(n-l) R-(n- 



(n-2) 



Rn-1 
Rn-2 

Ro 



the nxn covariance matrix corresponding to and the covariance samples in 
R n and, similarly, R ra for the Toeplitz matrix based on R„. If Q„, Q„ denote 
the corresponding finite Toeplitz covariances of the random processes ipk and 
respectively, for which ([T|) holds, then 



(0) 



and the minimal sum Qo + Qo °f the respective variances can serve as a metric 
quantifying the distance between R and R. 

The computation of Q„, Q„ minimizing the sum Qq + Qo, or equivalcntly 
minimizing 

-trace (Q„ + Q„), 
n 

is a convex problem -since the positivity constraints are convex. The Toeplitz 
structure is peripheral, and the idea of defining such metrics extends equally 
well to non-negative definite Hermitian matrices and to more general positive 
operators. For notational convenience we develop the framework in the context 
of real symmetric matrices. 
So, we let 

M n := {M e K nx ™ | M = M' > 0} 
be the cone of non-negative symmetric n x n-matrices and 

T n := {R G A4 n | R is a Toeplitz matrix} 

be the cone of non- negative Toeplitz matrices in M. n - We address the case of 
matrices in A4 n and define a suitable metric, which is then specialized to T n . 

Proposition 1. Let M\,Mi £ M. n and 

t(Mi,M 2 ) := mini -trace (M) I M £ M n , 
[n 

M > Mi and M > M 2 } . 

Then 

S(M U M 2 ) := 2t(M u M 2 ) - trace (Mi) - trace (M 2 ) (7) 
defines a metric on M. n . 
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Proof. Given M X ,M 2 £ M n , 

C(M X ,M 2 ) :={M \ M> M x and M > M 2 } 

is a (convex) cone of non-negative definite matrices, ft follows that there is 
an element M\ 2 € C(Mi, M 2 ) having minimal trace. 

Clearly 5{M\, M 2 ) is symmetric in its arguments and takes positive values 
unless M\ = M 2> in which case 5{M\, M 2 ) = 0. Thus, we only need to prove 
the triangle inequality. Given Mi € Ai n for i 6 {f,2,3}, we denote by M,fc 
corresponding minimal elements as above for i, k £ {f , 2, 3}, and we let 

Ak ■= M ik - M k . 

These matrices are non-negative by construction, the identities 

M, + A kl = M k + A lk 

hold, and 

5(M h M k ) = -trace (A lk + A ki ) 
n 

for i,k€ {1,2,3}. But then, 

Mi + An - ^12 = M 2 

= M 3 + A 23 -A S2 , 

and hence, 

Mi + Ah + ^32 = M 3 + A 23 + A u . 

From the minimal property of A3 and of A 3 i with regard to having the least 
value for the combined trace so that Mi + A 3 i = M 3 + A3, it follows that 

trace (A3 + An) < trace ( Ai + A2 + A3 + A2). 

Therefore, 

6{Mi, M 2 ) + 5(M 2 ,M 3 ) = S(Mi,M 3 ), 

which completes the proof. □ 

We now observe that the steps of the proof of Proposition [T] permit in- 
corporating linear constraints on the structure of elements of M. n , such as 
the constraint of all matrices being Toeplitz. Hence, whereas 8(-, •) may be 
used directly as a distance measure between elements of T n , the correspond- 
ing minimal-trace perturbations Afc may not belong to T n in general. But, 
since the Toeplitz property is a linear constraint, we may define a completely 
analogous distance measure enforcing such perturbations (if so desired) to be 
Toeplitz. 
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Proposition 2. Let M 1; M 2 G T n and 

t t (Mi,M 2 ) := min 1 ^trace (M) I M e T n , and 
^ n 

M > Mi, M > M 2 | . 

Then 

6 T (Mi, M 2 ) := 2r r (M x , M 2 ) - trace (Mi) - trace (M 2 ) (8) 
defines a metric on T n . 

Proof. The proof follows the steps of the proof of Proposition [1] verbatim, 
except for the fact that we now constraint all matrices to belong to T n . □ 

Proposition 3. Let f y , f y be power spectral densities, i.e., nonnegative and 
integrable on [— n, ir]. Let as before R„, R„ denote the corresponding Toeplitz 
covariance matrices, and let n 6 {1, 2, . . .}. Then 

lim <5 r (R„,R„) = ||/ y - /y||i. 

n— »oo 

Proof. Clearly 

lim MRn.R*) < ll/y-/ylli 

n — ^oo 

since a choice of ipk and 4>k with power spectra as in (|3all3bp gives rise to partial 
covariance matrices Q„, Q„, for all n, for which (J6j) holds. The respective Oth 
elements Qq and Qo remain the same for all n and the left hand side is 

ll/y-/ylll = Qo + 4) 

since the power spectra in (|3all3bp have no overlap in their support. 

To show the converse inequality, consider the sequence of minimizing Q„, 
Q„. These are Toeplitz matrices with bounded entries (since their correspond- 
ing Oth element is bounded by ||/ y — / y ||i)- Each can be extended to an infinite 
Toeplitz matrix, and thereby, gives rise to power spectral densities q n and q n 
such that the first n Fourier coefficients of f y + q n and f y + q n coincide. The 
spectral densities q n and q n can be obtained from Q„, Q„ by any particu- 
lar positive extension, for instance a "maximum entropy" one. We can take 
those as pairs, and since they are bounded there exists a subsequence weakly 
convergent to possibly non-negative measures, dfi and dfi, such that 

fyd8 + dfi = fyd9 + dfx 

since their Fourier coefficients must coincide. If <i/i, dfi do have singular parts 
then these should be identical and the absolutely continuous parts must bal- 
ance as well, so there exist power spectral densities q and q such that 



fy + Q = fy + Q- 



(9) 
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But then, 



lim <5 r (R„,R„) > lim — / (<?„(0) + q n {6))dd 

n— >oo n— »oo Z7T 



i 

2^ 



(<i/i + d/i) 



> ll/y-Zylll, 



the last inequality from Q. □ 
3.1 An example 

The metric <5 T (R„, R„) of the previous section admits no simple expression 
in terms of the respective eigenvalues. This should be contrasted with its 
limiting value G?(/ y ,/y) which is the L\ distance between the corresponding 
power spectral densities. We highlight this with an example. 
Let 

1 1 1 
1 1 1 
1 1 1 



R 



3 — 



Ha 



1 1/2 1/2' 
1/2 1 1/2 
1/2 1/2 1 





X 


V 


y 


Q 3 = 


y 


X 


V 




. y 


V 


X 




X 


V 


V 


Q 3 = 


V 


X 


V 




V 


V 


X 


i + 


V = 


1/2- 


\-v 



and 



Then, clearly, 



and 



where 



and x is minimal subject to Q3 > as well as Q3 > 0. The last two inequalities 
imply that 



1 



< x as well as 
-x < y,v < x. 



It follows that the optimal choice (minimal a:) is 
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x = 1/3 

y = -1/6 

v= 1/3. 

Then 

5 T (R 3 ,R 3 ) = 2x = 2/3, 
while the respective eigenvalues are 

spec(R 3 ) = {3, 0, 0} and 
spec(R 3 ) = {1, 1, 1/2}. 

It appears that there is no simple expression for <5 T (R 3 ,R 3 ) based solely on 
knowledge of spec(R 3 ) and spec(R 3 ). 

The covariance R 3 has a unique extension and corresponds to a measure 
with unit weight at 9 = 0, i.e., a spectral line (Dirac delta) at 6 = 0. Assuming 
that R 3 originates from a spectral measure which has a similar weight of 
amplitude 1/2 at 6 = and a uniform absolutely continuous part of amplitude 
1/2, then 

||d/i-d£||i = 1/2+1/2=1 

adding the Li-norm of the difference of the absolutely continuous parts with 
the absolute integral of the discrepancy between the two measures. We leave it 
as an exercise to the reader to verify that if R„ is as we just assumed, namely 
Rk = 1/2 for k > 1, and similarly, R k — 1 for all k, then (5 T (R„, R„) — > 1 as 
n — > co. 



4 Approximating sample covariances 



It is often the case that the autocovariance matrix R„ of a random process 
y/c is estimated in a way that does not guarantee this to be Toeplitz. For 
instance, it is quite common for R„ to be estimated by averaging observation 
samples 



Rn 



1 



£=0 



Vi+e 



Vn+l 



[Vl+t ■ ■ ■ Vn+i] 



The estimate R n is non-negative definite by construction, but may not be 
Toeplitz. Yet, for purposes of analysis it is often beneficial to approximate 
R„ by a Toeplitz one, or possibly, by one with additional structure (e.g., 
corresponding to a moving average process or, more generally, to the state 
of a known dynamical system). The problem of seeking such an approximant 
which is closest to R„ in S(-, •), is readily solvable via convex optimization. 
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4.1 Comparison with the von Neumann entropy 

In p] , the question was raised as to what are appropriate ways to approximate 
a given sample covariance with one that abides by a known linear structure. 
It was proposed that the Kullback-Leibler-von Neuman distance 

§(R||R) := trace (R (logR - logR^ 

provides a convenient convex functional for which the optimal approximant is 
uniquely defined. An academic example was presented in pQ which is recapit- 
ulated here as it helps underscore differences with approximation in the sense 
of minimizing <5(R, R). 

Consider the positive-definite matrix below as the estimated value for a 
covariance matrix 

1.1 .9 1.05" 
.9 .8 .9 . 
1.05 .9 1.1 

The minimizcr of 

{S(R,R) | R being Toeplitz, R > 0, tracc(R) = trace(R)} 
is unique (see [T]) and given by 

1 .942 .957" 
.942 1 .942 . 
.957 .942 1 

It is interesting to point out the the closest Toeplitz matrix to R in the least- 
squares sense fails to be positive-definite ([T], cf. [2])- On the other hand, the 
optimal approximant in S(-, -)-sense can be obtained by observation and is 
equal to 

1.1 .9 1.05" 
.9 1.1 .9 . 
1.05 .9 1.1 

In the above, a second subscript indicates the sense in which the matrix ap- 
proximates R3. Obviously the traces of R3^ and R3 are not the same, in 
general. However, equality of the traces can be easily imposed as an added 
linear constraint. 

4.2 Structured covariances 

For purposes of illustration, consider a moving average process 




R3,vN — - 



Yk = Wfc + Wfc_i + w fe _ 2 
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where is a zero-mean, unit-variance, Gaussian white noise process. The 
autocorrelation sequence of y/. is 

[Ro Ri R 2 R 3 . . .] = [3 2 1 . . .] . 

Simulating y k over a window k e {0, 1, ... , 100}, and based on a particular 
such realization, the corresponding nx n sample covariance matrix, for n = 5, 
was computed to be 



Rs 



4.0362 2.9053 1.8043 0.4042 0.1718 
2.9053 4.0547 2.9268 1.7945 0.3800 
1.8043 2.9268 4.0792 2.9143 1.7733 
0.4042 1.7945 2.9143 4.0819 2.9421 
0.1718 0.3800 1.7733 2.9421 4.0237 



Obviously, this matrix is not Toeplitz due to the finitcness of the observation 
record. The closest Toeplitz approximant to R5, in the sense of the metric 
£(•, •), turns out to be 

"4.0677 2.9237 1.7912 0.3979 0.1822" 
2.9237 4.0677 2.9237 1.7912 0.3979 
R. 5 ,Toc P iitz = 1-7912 2.9237 4.0677 2.9237 1.7912 
0.3979 1.7912 2.9237 4.0677 2.9237 
0.1822 0.3979 1.7912 2.9237 4.0677 

for which 

<5(R-5, R-5, Toeplitz) = 0.0308. 

Interestingly, R5, Toeplitz does not correspond to a moving average process of 
order 2 (or even, of order 3, 4) as it can be readily verified by the fact that 
the trigonometric polynomials, e.g., 



4 

E 

fe=— 4 



R k e 



jkO 



takes negative values. 

The set of covariance matrices which are generated by moving average 
processes of a given order, is convex and admits a characterization via a set of 
linear matrix inequalities (8, 3J). Thus, the closest approximant to R which 
corresponds to a moving average process of any given order can be readily 
computed. In particular, if we specify the order to be 2, then the optimal 
approximant to R5 becomes 



R 



5,MA(2) 



3.9945 2.1588 0.5693 
2.1588 3.9945 2.1588 0.5693 
0.5693 2.1588 3.9945 2.1588 0.5693 
0.5693 2.1588 3.9945 2.1588 
0.5693 2.1588 3.9945 



for which 



<5(R 5 ,R 5 ,ma(2)) = 1.2161. 
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